Automatic Normalization of Temporal Expressions
نویسندگان
چکیده
Dates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset is further compounded when attempting to integrate multilingual data – particularly where dates may be expressed words rather than numbers. problem found temporal metadata, whether manually entered or generated via Natural Language Processing (NLP) techniques reports grey literature. Resolving normalizing internationally agreed standard formats enables efficient integration, interchange, search, visualization. This paper on design implementation tool normalize expressions numerical time axis reflects key issues. Textual seven categories expression have been normalized: Ordinal named numbered centuries; Year spans; Single year (with tolerance); Decades; Century with prefix; Named periods. following languages currently supported: Dutch, English, French, German, Italian, Norwegian, Spanish, Swedish, Welsh. Methods together an (open source) normalization developed Python four applications method discussed, limitations future work. Results presented diverse sets languages. input text string language code (ISO639-1). output tab delimited file start/end years (in ISO 8601 format), relative Common Era (CE). normalized outputs provided as additional attributes along original consuming software employ end-user applications.
منابع مشابه
KUL: Recognition and Normalization of Temporal Expressions
In this paper we describe a system for the recognition and normalization of temporal expressions (Task 13: TempEval-2, Task A). The recognition task is approached as a classification problem of sentence constituents and the normalization is implemented in a rule-based manner. One of the system features is extending positive annotations in the corpus by semantically similar words automatically o...
متن کاملRecognition and Normalization of Temporal Expressions in Serbian Texts
This paper presents a system for recognition and normalization of temporal expressions (TEs) in Serbian texts according to the TimeML specification language. Based on a finite-state transducers methodology, local grammars are designed to recognize calendar dates, times of day, periods of time and durations, to determine the extension of detected expressions, as well as to normalize their values...
متن کاملNormalization of Relative and Incomplete Temporal Expressions in Clinical Narratives
OBJECTIVE To improve the normalization of relative and incomplete temporal expressions (RI-TIMEXes) in clinical narratives. METHODS We analyzed the RI-TIMEXes in temporally annotated corpora and propose two hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative domain: the anchor point hypothesis and the anchor relation hypothesis. We annotated the RI-TIMEXes in three ...
متن کاملAutomatic Temporal Expression Normalization with Reference Time Dynamic-Choosing
Temporal expressions in texts contain significant temporal information. Understanding temporal information is very useful in many NLP applications, such as information extraction, documents summarization and question answering. Therefore, the temporal expression normalization which is used for transforming temporal expressions to temporal information has absorbed many researchers’ attentions. B...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of computer applications in archaeology
سال: 2023
ISSN: ['2514-8362']
DOI: https://doi.org/10.5334/jcaa.105